136 research outputs found

    A Bi-Criteria Algorithm for Scheduling Parallel Task Graphs on Clusters

    Get PDF
    International audienceApplications structured as parallel task graphs exhibit both data and task parallelism, and arise in many domains. Scheduling these applications on parallel platforms has been a long-standing challenge. In the case of a single homogeneous cluster, most of the existing algorithms focus on the reduction of the application completion time (makespan). But in presence of resource managers such as batch schedulers and due to accentuated pressure on energy concerns, the produced schedules also have to be efficient in terms of resource usage. In this paper we propose a novel bi-criteria algorithm, called biCPA, able to optimize these two performance metrics either simultaneously or separately. Using simulation over a wide range of experimental scenarios, we find that biCPA leads to better results than previously published algorithms

    Impact of Mixed--Parallelism on Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms

    Get PDF
    In this paper we study the impact of the simultaneous exploitation of data-- and task--parallelism on Strassen and Winograd matrix multiplication algorithms. We present two mixed--parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison- , in terms of memory usage and execution time, between our algorithms and classical data--parallel implementations. This analysis is corroborated by experiments. Finally we give some hints about an heterogeneous version of our algorithms

    Improving the Accuracy and Efficiency of Time-Independent Trace Replay

    Get PDF
    Simulation is a popular approach to obtain objective performance indicators on platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we detail the performance issues that we encountered with the first implementation of our trace replay framework. We propose several modifications to address these issues and analyze their impact. Results shows a clear improvement on the accuracy and efficiency with regard to the initial implementation.La simulation est une approche populaire pour obtenir des indicateurs de performance objectifs sur des plates-formes qui ne sont pas nĂ©cessairement accessibles. Elle peut par exemple aider au dimensionnement d'infrastructures dans de grands centres de calcul. Dans un article prĂ©cĂ©dent, nous avons proposĂ© un environnement pour la simulation hors-ligne d'applications MPI. La principale originalitĂ© de cet environnement par rapport Ă  la littĂ©rature est de ne reposer que sur des traces indĂ©pendantes du temps. Cela nous permet de dĂ©coupler totalement l'acquisition des traces de leur rejeu simulĂ© effectif. Nous sommes ainsi capables d'obtenir des traces pour de trĂšs grandes instances d'applications sans ĂȘtre limitĂ©s Ă  une exĂ©cution au sein d'une seule grappe de machines. Enfin, cet environnement est fondĂ© sur un noyau de simulation extensible, rapide et validĂ©. Dans cet article nous dĂ©taillons les problĂšmes de performance rencontrĂ©s par la premiĂšre implantation de notre environnement de rejeu de traces. Nous proposons plusieurs modifications pour rĂ©soudre ces problĂšmes et analysons leur impact. Les rĂ©sultats obtenus montrent une amĂ©lioration notable Ă  la fois en termes de prĂ©cision et d'efficacitĂ© par rapport Ă  l'implantation initiale

    Evaluation of Profiling Tools for the Acquisition of Time Independent Traces

    Get PDF
    In a previous work, we proposed a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. Time-independent traces are an original way to estimate the performance of parallel applications. To acquire time-independent traces of the execution of MPI applications, we have to instrument them to log the necessary information. There exist many profiling tools which can instrument an application. In this report we propose a scoring system that corresponds to our framework specific requirements and evaluate the most well-known and open source profiling tools according to it. Furthermore we introduce an original tool called Minimal Instrumentation that was designed to fulfill the requirements of our framework.Dans nos précédents travaux, nous avons proposé un environnement pour la simulation hors-ligne d'applications MPI. Sa principale originalité vis-à-vis de la littérature est de s'appuyer sur des traces d'exécution indépendantes du temps. Cela constitue une maniÚre originale d'estimer les performances d'applications parallÚles. Pour acquérir de telles traces indépendantes du temps lors de l'exécution d'applications MPI, nous devns les instrumenter afin de recueillir toutes les informations nécessaires. Il existe de nombreux outils de profiling permettant d'instrumenter une application. Dans ce rapport, nous proposons une méthode de notation correspondant aux besoins spécifiques de notre environnement et évaluons les outils de profiling open-source les plus connus selon cette méthode. De plus, nous introduisons un outil original, appelé Minimal Instrumentation, spécialement conçu pour répondre aux besoins de notre environnement

    Dynamic Performance Forecasting for Network-Enabled Servers in a Heterogeneous Environment

    Get PDF
    This paper presents a tool for dynamic forecasting of Network-Enabled Servers performance. FAST (Fast Agent's System Timer}) is a software package allowing client applications to get an accurate forecast of communicat- ion and computation times and memory use in a heterogeneous environment. It relies on low level software packages, i.e., network and host monitoring tools, and some of our developments in computation routines modeling. The FAST internals and user interface are presented and a comparison between the execution time predicted by FAST and the measured time of complex matrix multiplication executed on an heterogeneous platform is given

    One-Step Algorithm for Mixed Data and Task Parallel Scheduling Without Data Replication

    Get PDF
    International audienceIn this paper we propose an original algorithm for mixed data and task parallel scheduling. The main specificities of this algorithm are to simultaneously perform the allocation and scheduling processes, and avoid the data replication. The idea is to base the scheduling on an accurate evaluation of each task of the application depending on the processor grid. Then no assumption is made with regard to the homogeneity of the execution platform. The complexity of our algorithm are given. Performance achieved by our schedules both in homogeneous and heterogeneous worlds, are compared to data-parallel executions for two applications: the complex matrix multiplication and the Strassen decomposition

    Time-Independent Trace Acquisition Framework -- A Grid'5000 How-to

    Get PDF
    GRID5000This manual describes step-by-step how to create a Grid'5000 appliance that comprises all the tools needed to acquire time-independent traces of the execution of an MPI application. Time-independent traces are an original way to estimate the performance of parallel applications. It allows to totally decouple the acquisition of a trace from its replay in a simulation framework. This manual also details the different acquisition scenarios allowed by this approach. Traces can be acquired in a very classical way, by folding the execution on less resources, or by scattering the execution across multiple clusters.Ce manuel dĂ©crit pas Ă  pas la crĂ©ation d'une image systĂšme pour Griud'5000 comprenant tous les outils nĂ©cessaires Ă  l'acquisition de traces de l'exĂ©cution d'une application MPI qui sont indĂ©pendantes du temps. L'utilisation de telles traces est une approche originale pour estimer les performances d'applications parallĂšles. Cela permet de dĂ©coupler entiĂšrement l'acquisition d'une trace de son rejeu dans un environnement de simulation. Ce manuel dĂ©crit Ă©galement les diffĂ©rents scĂ©narios d'acquisition rendus possibles par cette approche. Les traces peuvent ĂȘtre obtenues de façon classique, en repliant l'exĂ©cution sur moins de ressources, ou encore en rĂ©partissant l'exĂ©cution sur plusieurs grappes de machines

    SimGrid: a Sustained Effort for the Versatile Simulation of Large Scale Distributed Systems

    Full text link
    In this paper we present Simgrid, a toolkit for the versatile simulation of large scale distributed systems, whose development effort has been sustained for the last fifteen years. Over this time period SimGrid has evolved from a one-laboratory project in the U.S. into a scientific instrument developed by an international collaboration. The keys to making this evolution possible have been securing of funding, improving the quality of the software, and increasing the user base. In this paper we describe how we have been able to make advances on all three fronts, on which we plan to intensify our efforts over the upcoming years.Comment: 4 pages, submission to WSSSPE'1

    Assessing the Performance of MPI Applications Through Time-Independent Trace Replay

    Get PDF
    International audienceSimulation is a popular approach to obtain objective performance indicators platforms that are not at one's disposal. It may help the dimensioning of compute clusters in large computing centers. In this work we present a framework for the off-line simulation of MPI applications. Its main originality with regard to the literature is to rely on time-independent execution traces. This allows us to completely decouple the acquisition process from the actual replay of the traces in a simulation context. Then we are able to acquire traces for large application instances without being limited to an execution on a single compute cluster. Finally our framework is built on top of a scalable, fast, and validated simulation kernel. In this paper, we present the used time-independent trace format, investigate several acquisition strategies, detail the developed trace replay tool, and assess the quality of our simulation framework in terms of accuracy, acquisition time, simulation time, and trace size.La simulation est une approche trĂšs populaire pour obtenir des indicateurs de performances objectifs sur des plates-formes qui ne sont pas disponibles. Cela peut permettre le dimensionnement de grappes de calculs au sein de grands centres de calcul. Dans cet article nous prĂ©sentons un outil de simulation post-mortem d'applications MPI. Sa principale originalitĂ© au regard de la littĂ©rature est d'utiliser des traces d'exĂ©cution indĂ©pendantes du temps. Cela permet de dĂ©coupler intĂ©gralement le processus d'acquisition des traces de celui de rejeu dans un contexte de simulation. Il est ainsi possible d'obtenir des traces pour de grandes instances de problĂšmes sans ĂȘtre limitĂ© Ă  des exĂ©cutions au sein d'une unique grappe. Enfin notre outil est dĂ©veloppĂ© au dessus d'un noyau de simulation scalable, rapide et validĂ©. Cet article prĂ©sente le format de traces indĂ©pendantes du temps utilisĂ©, Ă©tudie plusieurs stratĂ©gies d'acquisition, dĂ©taille l'outil de rejeu que nous avons dĂ©velopĂ©, et evaluĂ© la qualitĂ© de nos simulations en termes de prĂ©cision, temps d'acuisition, temps de simulation et tailles de traces

    Budget Constrained Resource Allocation for Non-Deterministic Workflows on a IaaS Cloud

    Get PDF
    Many scientific applications are described through workflow structures. Due to the increasing level of parallelism offered by modern computing infrastructures, workflow applications now have to be composed not only of sequential programs, but also of parallel ones. Cloud platforms bring on-demand resource provisioning and pay-as-you-go payment charging. Then the execution of a workflow corresponds to a certain budget. The current work addresses the problem of resource allocation for non-deterministic workflows under budget constraints. We present a way of transforming the initial problem into sub-problems that have been studied before. We propose two new allocation algorithms that are capable of determining resource allocations under budget constraints and we present ways of using them to address the problem at hand.De nombreuses applications scientifiques sont dĂ©crites sous la forme de workflows. Du fait de l'accroissement du niveau de parallĂ©lisme offert par les infrastructures de calcul modernes, de telles applications doivent dĂ©sormais ĂȘtre composĂ©es non seulement de programmes sĂ©quentiels mais aussi de programmes parallĂšles. Les Clouds offrent le provisionnement de ressources Ă  la demande ainsi qu'une facturation Ă  l'utilisation. L'exĂ©cution d'un workflow correspond alors Ă  un certain budget. Dans cet article, nous considĂ©rons le problĂšme de l'allocation de ressources Ă  un workflow non dĂ©terministe en prĂ©sence de contraintes de budget. Nous prĂ©sentons une façon de transformer le problĂšme initial en une sĂ©rie de sous-problĂšmes qui ont Ă©tĂ© largement Ă©tudiĂ©s. Nous proposons deux algorithmes originaux qui peuvent dĂ©terminer des allocations de ressources sous contrainte de budget. Nous dĂ©taillons Ă©galement comment les utiliser pour rĂ©soudre le problĂšme initial
    • 

    corecore